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Combinatorial chemistry is a tnnl n f ■ 

exploited, it will beime ? 3 SCreenin g capacity. For f' be constructed greatly 

aI radyinexjsten«^ technology to be fully 

Similarly, if we l e t0 faL f ^ 
optimization, it is impo^ 

scoring functions thatSX modiheH ^ ^ P-^TareSSbf 6 ^ ,6ad 
addressed through the intlri 6d , t0 suit Particular projects h. ,L ' Wlth dive rsity 
HARPick fteuriSc alg 0 rE f °" ° f 3 ^P«ter-ffiS?br2l ri Cha " engeS a ™ 

chemist, and incornorS f ^ reagent Peking) The Droo JmTc Y ' gn t00 ' known as 
'^wludeST^?*™ 1 S, 'g nifica nt advances over n '? , accessib,e to the bench 
diversity n*,^^^* "*<*^ that canfe n r " ^ va ''f e approaches. 



Introduction 

*^!^3^J ^-nblnatorial syn- 
computationai tools that aid "ft"" 5 ™** to create 
become acute.' To Let this ^ deSign has 

have developed newTethodl " t ° f grOU P s 

ment and compound Son ® Y measur * 
highlight some concerns w th '.v' " ^ P3per ' We filst 
then describe ou S t ^ 1 St,n S me thodology and 

HARPick (heurtefXS i rS M ^ ^ Che 
Program. The r eauhll ^^'^ Picking) 
discussed criticallv anrf r >, lJ 'ustrated and 

^^2,5^ for 



Combinatorial Libraries 

technology t0 use is dependent on r appr °P riate 
goals of the project Ini^h™^ 6 resour ces and 

chemistry can £ de me d as 1 ""^ """Atonal 
possible combinations of a'rnn P ? C6SS ° f makin S a H 
given reaction. UsZ thi?rf7 P "- 6 r6agentS usin § a 
how the number of Vossibt 0 "' * is ^ t0 se e 
greatly exceed the reso^ ' f TCr^ 5 Wi " 
gate organization. If we rani " m ° St pr ° fli - 

amide condensation chZJnolu l ' eXamp,e of an 
from a commercial «taK^ I "^ Uab k te rea g e "ts 
over 3000 amine svnthn ^ o ad t0 the selec tion 
Combining th™"t w ^1 , °°° 8Cid Svnthons - 

parting poimrf anTdl^o^ the 
^^S^ ^ P = S 
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goring Jneed to^e Z d ^ $1 ™f on, 
and the need to disco*, of 1 1U U0 ° 96 ' well plates, 
' (Possibly radioact S ' wn 'I 8 , 6 ,'' 1 ' 8 """" ° f Waste 
screening Such ™ ■ be S enera ted by the 

st-te gy tfdti n n'^L'ar?? W ^ 3 
«n of wha f b L^by the ^ 

in terms relevant o d 1 m ° JeCU,ar di ^rsity 

measures include reaZ ' nteractions . 2 These 

3D fingerprints" a„H h P ' °" 3 3D la "ice, 3 2D/ 
have bfen P a n^brof^T^ ^ ^ 
assess descriptor quality for r H at tempted to 

scriptors were ranked Z H J • y P rofi!i "g ° De- 
active and inactivflX u ds ^ t0 diSCrimi " a te 
medicinal chemistrvo o^' h '" 3 number of 

it was suggest d thnt 2n f In these studi ", 

descriptor! n^^^^T* ^ 
tives such as 3D phann^^^/t-na- 
own perspective snrh nJ. m g e, pnnts. ^romour 
quality are ratheVswLp n ! 2Ds TT'^ 8 
are used routinely to extract SL ' bst ™ture searches 
Similarly, measufemen nf . gU6S fr ° m databases.^ 
of the St y a P , e de^^fe^^-n- 
capacity to distinguish acVivt frf calcuIa tions.9 A 
from a single biobg i cal sc en ' T maCtiVe anaI °S ues 
hardly proof 0 f an ^ ^ , a ' lanomo,ar 'evel, is 

heterogeneous activity c asses w^™™* betWee " 
class, differences as si i ^ m 3 Sin 8 le activity 

significant effe^ So^X as a methy, gimip can hav y 

that exist between S ' he Structura ' fences 

0-chiarger.howeve? Z s ^ ^ t0 be 

of such studies couid hale teen prec^"? 
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^^JS^^^^i^ may be 

Pharmacophore Descr Z n ? T ^ 

defined here as the crS?. A P harma cophore, 

efficient descriptor for pS^^' 0 ^** an 
J.ons. defining a necesS^'^^P^ 
for biological activity When Sufflc,ent condition 

on searching f or noveJ ™ ^J*£**e literature 
general consensus suggested S u database ^ the 
the descriptors of P t™ 3 ^ 5 ^ 

searching within databa Ses of k tS ° f 30 n «Me 
P-ven this in a p,^^^^ compound have 

seems reasonable to employ su r h Ho * therefore 

ular diversity calculations 7ast h f ^ f ° r molec - 
coverage of accessible P h™ C o n h Pr ° Viding g°° d 
P^ove a good source o Z "T" Sh ° U,d also 

damage of pharmacophores is tht f m ^ or 
a whole molecule descriptor in f 5 a " that the y are 
concept of conf orma ^^^mn ^ *e 

sured on the prolc ts of The - ^ * mea ' 

rather than the re agents a,*' ^.natortel reaction, 
tationally expensive cho i e ,- !* * * m ° re com Pu- 
"umbera involved. There are , '7 y beC3USe of the 
this choice. Re ag e nt "SeTrf^ for "^g 

assessments ar ebLed on the asr Pt ° rS and di ^4 
t-es of f ragments b an 7 T *** ^ 

assessing diversity for each ™? , lnde P end ent when 
horary. This W mZlt^T a ^ ?°™ b, ™™al 
based (or most other 3D) fi.nl T Pharmacophore- 
been shown that when additi ° n k ^ 
descriptors derived from uu 8 data Sets employing 
compound select TonlZl er7 the -sulLg 

ficiently than comparabTe clT" more «* 

agents. '3 Third care m„, X CaIculations utilizing re- 
factions to ma r s r e s ? h ; t a ^ withreagent - ba - d 

>nterlib rar y C0m js tiWh? are suitable for 
for the choice between clus term* I T™ argUment as 
In contrast, the pharmaconh' ^T* P artiti °ning data. 
i"teriibrary comparisons r dpt0r * id " al f ° r 

the calculations, our p actical exn Pr S P eed ° f 

design is not the rate limi^ P e " Ce is that hbrary 
^"thesis, so that there™ T!, T P ™ """bmatorial 
performing product-based cakut '"'^ T be ' lefit t0 
we describe below. For 111 of 1 P ' ° f the ty P e that 
are concerning ourselves nri , reaS ° nS ' since we 
screening libraries Shis p Zr * ^ SUCh genera ' 
pharm acophores ag * "We ^we have chosen to use 

d ^ lo Pedpnxluct-oriente?Se t ho3rH npt0r and have 
computational cost. meth ods, despite the extra 

dure^ar^ Two basic proce- 

These involve the appliS^'?^ de scriptor space, 
and (ii) cell-based pEnZtS "T^ 
selection. Clustering mSS^^ f ° r COm P ou " d 
division of a g^J^^} be deft "ed as the 
■ntracluster similarity and t , USt6rS with high 
S-h techniques hav/bS u ST" diSS ™^l 
gener atmg diverse « for many years for 

Partitioning involves f h»Vu for screening.?-^ 
into a number « P"Perty 5 pace 

subset for maximal coverage of th. Ctl ° n ° f an ob J^ 
overage of these property bins. This 



data^s'Sc'hdS^b^:^ 6 °; dean] y divid -g "P 
Property space. diSCOnt ™ousfy 
we anticipate the saturation I * ^ n ° wever . 
be nce discontinuity shou / Pr ° pert y s P a ce, and 
Partitioning techniquS hav f ^ nt f ma Jor problem, 
convenient common fi ^ ^"f 1 IJ^* ° f P™vidinga 
space, making comnariZ , ,efer ence in property 

a simple proLss 'a 0 hei ' 7'" " bra "es 
methodology is that ^ l^^e of cell-based 
bnearly witMhenumb a cul ^ " t. mes tend to scale 

makmg the partitioning oarariia 6mg Passed, 

f or large data sets. A problem T' f Suitable (faster 
to the descriptors used wth t h duSterin g specific 
difficult to employ a pS rma S ? tUdies is it is 
clustering cal cu , P ati y on S (fpo f ° r 
•n one of the descriptor corZ r P 601 enc °untered 
because the fin ger p P nt C a ™ 0n Studies ^ This is 
molecule basis, Ll'ng C h e slnSr"^ ^ 3 Per 
discontinuous and (ii) ton = ' y mea sure very 
Pharmacophores pre l t ° n a o nS,t,Ve ~ the nu ™ber of 
approximately the'cu e ^ m r ,e , CU,e V3des as 

centers. Small molecular hI p " P ha ™ a cophore 
fally lead to l ai -g e dift en ce s n e '; CeS ^ thus P ot en- 

As we wished m . , hngerprint. 
over .arge ^ ^ £ P^- cophore descriptors 
■nterlibrary comparison, w e ch S t0 Underta ke 

based approach. ° Se to use a partition- 



The Chem-Diverse Approach 

tftaS ofphar' .S^T, • D j! t : 1 Under t ak en into the 
tors. '6.17 Recently T P Sas mole cular descrin- 
Dive,se V^^"^* P^ram, cS- 
triplet information in dS ^ 0,t P ha rmacophore 
Provides a variety o f J U„' " g ' Chem -Diverse 
"^ent by pharmacophore and t f dl ' VerSity assess - 
standardforthisasLt of comi °" ling a " indust ry 
Chem-Diverse proZ ZTZ f Th " 
on trying to obtain the maxZ V6rS ' ty is based 

cophore space by poteS r rK C ° Verage ° f P h ™- 
products (Figure 1) combmatorial chemistry 

gen W e^^; e lt°^ fl f ™ with our 

jessing Phan llaC o P hl rvStT aPPrMCh ^ 
Chem-Diverse suffers from , 7' Current ve rsion 
backs which need t^e^d ^ 

below. essecl - 1 hese are discussed 

a r c C 7V OUnd Se,eC «°" Us- 
Diverse compound se )eclnnrn r H Part ° f Che Che rn- 
% conformational ^^0 ^ ' T™ ° n ' the - 
products (see Figure 1) Anv nl Potential library 

added to a single pha r ma Zt armaco P h °re S found are 

the ensemble ?f s^ectTd TotuJ^' d6SCribes 
only selected if the set of n h* L Com Pounds are 

overlaps with the en 1 ^ mac ( °Phores they express 
defined amount, that T r fh^ 635 than a u ser- 
significant number of , ^i-evioJs.v" 1 6CU,e C ° ntains a 
Phores. Asaconsequene Hp y , UnSeen P h armaco- 
are dependent on the order in ? f SUCh Sea '- ch es 

extracted from the cL l bas ' .^f mole c^es are 
Pass clustering algorithm'? (d,ld,0 g° us ^ the single- 
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of the potential products of? ' 
^bJnatori^ nthesjs 




cakulati ns Kfy e to r D l Paradi8m for Ch ^-Diverse« n 

somatic ring centrowffP j7^™8e" ^ donor. Ar = 
group of atoms with ~0 ctenT e ) + - Ce " tr0id of 
8ej ' + _ charged positive 

present in the selected ^dut "Th " ° f reagent s 
control is to be exerted over th' * Cmcial i{ 

required for any pardmUrTh " Umber of rea gents 
Chem-Diverse iZlZl cX?*^ ^ 
t considers to be the ™ P"^ based °" what 
fcherry-picklng"). with * *™™ ** ° f P rodu «s 
constituent reagente 1 ° P 1CIt referen « to the 
often make an Si n ^ r a^hem-Diverse Wi » 
Products. By inefficient mefflClent se 'ection of 

when selecting ,00 TodZsT" that ' e -mpl e , 
combinatorial library Z exl2T tW ° Com P°™™ 
ton discussed earlier) r ™ r P ^ he condensa- 
efficient 10 x 10 reagent set rL n- Ch °° Se an I00% 
choose compounds cf™ ris ^T' 0 ™™ WiU ,n *ead 
say 30 x 20. Using suTa ^11- ^ reagent subs «, 
costly and more SicS^" WuM ^ ^ more 
synthesis robot and? thus 1 P T am U P ° n *e 
thieve an e ff icient ^ J Us te ™ed inefficient. To 

combinatorial library u S Tn °ChT n. S ff0m 3 virtua ' 

Within CheXel m V o e d r r y Withi " ^m-Diverse 
* include additionTmSt a n 0n ° f Seareh ^ia 
molecula, properties such as shape 



'S not currently feasible Th' • . 
function employed by Chem n " ' S6 the div ersity 
non P harmacopL ic y pr ^^ 

unwanted reagents by assign int , P ° SS ' ble t0 rem °ve 
for given properties ln * and '™er bounds 

fons in compound selection i 8 pr ° ducts - Limita- 
-ch an approach r SSkv ™ k ™ 
Possible to devise an obiectivp T' Si " Ce k is not 
removal which is enS^^Pf^ °J ™W 
descriptors. This is importan T the P rodu ct 

created by a given reagem wm 56 n0t a " P rodu «s 
fie. Indeed, it may on J J '^T^ be Undss ^ 
the remaining products that are P°°C with 

of the library 8 ' add ' n § m "<* to the diversity 

Limitations in r^. 
Keys. The Chem-D ZZlZ^ Ph ?™acophore 

r emiaJJ y ext '-ernely useful tooTsT C T h ° re ^ are 
d esign. However they art ^ dlrectin g library 

-carnation, as the mLytZl " ^ 
or not a particular pharmacnnh y g ' SterS whet her 
molecular ensemble noTlZ ^ in the sel e«ed 
The creation of a nonb na v^ 'V™ 3 * is fo ^ d - 
has been developed TnTtl^ ^ Chem -Diverse 
^ current indent" on oS beg£ "\ H ° Wever ' 
not exploit the nonbinary ° ^"f V h ers,on > still does 
strain the construction of 1 , aco P hore d ata to con- 
key prone to satura ion eyl t?'"'^ ^ mak - 
distance bins are apphed "s is^ P Sma « 
b'n settings in Chem-Dive^e C£,S6 Wit " the default 

Methodology 

HARPick is illustrated in F T ^ e basic outline of 
features have been nco 1,17 nUmber o{ 

overcome many of the nZhZ the softwar e to 

Averse. ThesJ are KdSw" 50 ^ 1 ^ Wkh Ch - 

^t^,^ c o ation aj 

selection is dependent on £ de P e ndence (product 
Products are processed of C 1 n Whkh the 

and allow reagent selection ^ r em ;°' verse calculations 
sity, an alternative techn ioue nf Pr ° duct div er- 

required. We chose sTmu ated i T'r ^ wa ° 
^ it has a proven tra^k re l d ^ 35 ° Ur method - 
hopefully near global) minTm a 1 / ° Ca " ng g00d ( and 
surface (in this case 'enevZ 7 a . Com P Jica ted energy 
function score), and Z fe^T* " 
our diversity profiling mi Jr y n lncor P°rated into 
is based on a standarlj iTafecl ° Ur , im P ,em entation 
^Ploying fixed-length MaS al ^^ 9 
cooling.^ Essentially anchan", a ' nS md ^ namic 
:which result in a reduction in h g r6agent SeJec tion 
. are accepted, while changes produ e " er8y fU " Cti ° n {AE > 
accepted with a probabili'EwT^ 1 ^ ^ 
the annealing control P( A£/7 ). where Tis 

simple minimLr wl, ^JsoSThTt In addition ' a 
Procedureo n ly acc rpT sre T" 1Uded - The minimization 
after failing * finda ^ew" n "! ^ and te ™inates 
number of Markov chS f ° F 3 ^"defined 

$^Z£££*r. U " d - a ^n which 
"diversity proving TnZ S t0 the P^^lem 
nthms have been user! « ' , ' "Pt'mization algo- 
Unices such l^S^^ ^ij tic 

b- ^ume attempts have 
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Figure 2. Structure of the HARP- . 
procedure is centred around a sfm, ,^ 2"*""°- The b ^c 
whjch is used to make randlVeS^T^ fUnCtion 
of the components of combinatorial S T" ts frcm e ach 
controls of the input file a™S ^ S ° nle ° f the 
program include the folding 7^^* featu,es ° f *e 
selection from the optimization r 1 se P ar «ion of compound 
resent space and ft ky cTfion - a,i0W 'i n8 Se ' eC " 0 " 
The facility to force the incLTon of na " P '° duCt S P ace - 00 
component in the calculat or, This IT ''^^ fmrn ^ 
to the selected data set includinc for JlT f " Ser defined 
deemed essential by the chem fs' ( in Certain rea 8 e "« 

each of the functions usedTn th, H 6 Capabi,it y t0 "eight 
user requirements. Se e the mPf H h? ty ca,cu,a ti"n to suit 
details. 66 the m «hodology section for more 

r eeen made to postprocess data 

, Prbfile calculations, to deterrnin.7 r6SU,tin g diversity 
: reagents.** There hal * ? frea . uen tly occurring 
%ht the theore L pTen a °i 5? ^ M 
selection criterion from the di ? " g ^ re3 S ent 

«on- a We have attel S ? T? ' ty function calc ^a- 
such methodology S ^ apP ' ication of 

-ring function caS^ p ^^" '™ 
^Pamary advantages P Thls has two 

tions in reagent space whft. P ° SSlble t0 make s ^c- 
Product space. ThSow, £ ^ is CakuIat ed in 
the number of re"" 71™ T r direct control over 
Pool of a combinatfrial sltS fr0m u each ^ponem 
on: the Chem-Diverse S ', rather than flying 

scribed above. ThTs'ea^H'^'^' ap P roach d * 
HiRPick (Figure 2) 35 bee " im P<emented in 

^^^T^^ ible S ^" g Unctions 
well as inciting Vh ar rnaco P nn PemeS - Therefore ' as 
descriptor^tisalsoposirr^r 5 35 0ur P rima <-y 
descriptors. The reaTa^ant a , ddltlonal Seco » dar y 
P^^not.^^i.S^^ J such* 
diverse. Rather, thev mav h! ? made optimally 
- ensure l^Z^^l 



d-ribXr . h3Ve "••> a PP"ed and are 

wor^hTbL': S^£2 P ™™»S. For this 
tions are employed, u^J^TT^ descri P- 
center, pharmacophore cfistanc ' f h pha ™ aco Phore 
search parameters Each nh^ conf °™ational 
by three interaction « 1 ^i n ™™P hore is Rifled 
^Pes: (i) hydrogen b onr lTn T 8 S6Ven ce "ter 
acceptor, (iii f hydrogen bond dn'"' hydT0 ** n bond 
-omatic, (v, hy^pfo e b v "? a C ^° r M 
to a total of 84 combina I u ( 0 baSic ' ,eadi ng 
tria "gk edge distant rseoara! I " Each 
to a total of 184 884 geomtr icTn ^ UbinS ' ,eadin 8 
cophores. The numbefSf \ aCCeSSlWe P harma 
creation has been adj^^ 5,08 " Sed in the key 
the default version of c? le "/n mSteadof the 3 1 used 
coarseness is felt to be SK'T^ T his inc ^sed 
tional incr ements of the Che S " ^ Jarge r0ta - 
search procedures are cons ^7 C °" fo ™ a tional 
been tailored toapp™^^^, 17 W " S have 
mined experimentally for 3D dn7 a , to,era "ce deter- 

^k^'ssl" r hastic ° Ptimi2ati - 

aach molecule need to bl ^ "re'd ^^^P* 10 ^ keys for 
as and when required Itk nZ ^ 3nd access ed 
Chem-Diverse keys to ll l ° USe standa rd 

requires around T^ZT^fr ^ 
overcome this, a Chem-X^ p rr t ' Sk Space - To 
guage) script was written whirl-, (pr ° 8 , ram Contro1 la "- 
molecule, extracts the Qiein n ? T ^ data set 
a " d writes out the indrtidS' nl ' ^ ^ deCodes 
the structure. Since the kev fn harrnaco P hor es found for 
- sparsely ^ ^^'^"a' structure 

made with such an approach t I 8 SpaCe can be 
cophores, which ma lT P Z'bT S a ' Sent pharma " 
'gnored. Each pharmaconhlt dlSk Space ' are 

bytes of space; 1 fo^ZfZT^ ° M r6C < uires 4 
for the pharmacophore ype L^?. ""f™" pluS 1 
ecule, no single pharmacnnh , that f ° r each ™1- 

than once. Ihii t o' ^anc'^ " ^ ^ m ° re 
particularly p re mlscuo U s'SSe^ U8e f PreVe " tS 
Pharmacophore distributions £ ^, Skewin 8 the 
macophores prespnc in , y ' f not most . phar- 

t'- la'rgest pS^^^^^ ^ to 
unlikely to explain the bindinp ,f h " and are thus 
Particular receptor I ™ , 8 , ° f that mo| ecule to a 
technique wer emlL to ri 1°™ <* USeful if a 
Pharmacophores S n ^ " insi g"^t" 
which divides the area ran;"" 6 'T° VideS 3 met hod 
t- iangle by the numb r oSwT 
Any triangles falling beloi TJS,? 6 ? n f cute - 
value are removed from thf ■ "f ■ ■ r3tl ° for this 
with such an approach Jat th' f' 0 "' The P roblem 
heavy atom count an nhlrn I ™' at,onsh iP between 
P^ ely empirical As a cont P ^ trlan S le area * 
* set high t0 remn;;;r:'.:! qUenC . e ' if the quired ratio 
cant" Pharmacophores S0 8 ni ; Umbe, l ; fsuch "^signifi- 
their P harm a co P U S de et ed Zth " haVe 3 " 
largest pharmacophore area n - PP6nS when the 

small relative to the oTheV" ^ is 

around this problem, we Ce in ,^ y "T^ T ° 8 et 
tive, self^onsistent method for ™f ,ernent «' a " alterna- 

^e technique allows th^^XS!™ """^ 

iec me minimum ratio 
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current structure. Since al Ici^T*™^™ 5 in the 
with respect to the in"er na /nhi 3re carried out 

of each individual ^^^T^ 010 ^^ 
its pharmacophores oZ tt Can ,ose all 

Implementation Ta^T ^ " ^ 
Profiling Function To fu^v T^* Dive,sit y 
of a simulated annea 1 ng pSil" ' appl ' Cati °» 
functions have been ZZZ 'm * ^T* ° f USeM 
evaluation routine: COrporated ™° the diversity 

^S^JZI^^ ^ within 
Ployed by Che m -D?ve rse Tht ^ r Paramet ^ ™~ 
Xeeps a count of the Turrit of nh ^ efu ™tion 
occupied in the selected sTt „n,h pharmac °Phore bins 
also employing a nonbinarv rll ^ products We are 
Phore space, which TeaZ^T^ of pha ™aco- 
which Pharmacophores "re h i 7 0 " y d ° We kn ° w 
times. The UnlguefZction^' * ^ h ° W ma "y 
number of non-zer/varS, ' C ° rrespond s to the 
integer array. anabIeS ln ° ur Pharmacophore 

tiol^: iKSXribT^ fUnCti °" Ca '-< a - 
Properties which ^ TcrT * ^ 
(Note that we do nnt 7 measure of shape 

necessary tLt^TeT *^ »' 
however, simple to calculi hape ' The y are, 

aspects of mo'ecu ar s T Ze an T ° d6SCribe diff ™ 
••^tneeB^wlS^^^- Tt ^ also i, 
modified when usin o a t 1 ^ fu " Ctions can 
^ diversity analysfs. T h f se are T method 
atoms (ha), (b) largest triaS - ( } nUmber of hea vy 
Pharmacophores foZT^ f ° r a » 

Present for all pharmacop^f £S? V'rt area 
mum and maximum values for p» The mini " 

determined for the entire bh ' f Pr ° Perty are 
The resulting property ran 't ? h ° f P ° tential produ «s. 
partitions bef£e£S££L3 n dMded lnto e 1 ua < 
calculation, each selected S ? Urin S the diversity 

Partition according to -tforon r . 5 aSSigned to a 
of molecules in e/ch ZtitZ The numb ^ 

the number exp 8 c t i^i t, ^^ «™P-™d with 
The occupancy function is a m« • dlstr ibution. 

- ^"ally occupied m t h S h 0 p7o^r m * 6ach bi » 
in the generated nro dur t P K 0rCln8tnemolec ules 

distribution ofshap P es or oZ " t0 " aVe an ^ 

^^I^^„^ t xssr ,es ■ whi,e stiji 



Flex: 



(2) 



where Flex = flexibility score f - th 

descriptions of lllj^^^ ^ ^ 
step of our procedure is to ™ ab ° Ve ' the fir st 

Pharmacophores for a „ (Ze °Sn T the 
heresu 'ts using Chem-Dive sr S^h UCtS ^ St0re 
to sum the resulting molecular n! r , " 3 Simple task 
overall description of Jibra " nh - 1 ^ Pr ° dUCe an 
This descriptor may then 2 f a 7 naC ° ph ° re average, 
to optimize Pharma^i 3 ?* " " 8 
'■braries. To this end a constrain T V P ° tential new 
added to the profiling rou ^ ' T haS been 

macophore selection ^waS^T"^ pha " 
m previously constructed Jibra'ie? ^ V ° lds 

Conscore^go.^. ^ 

^^a^phoTJh^^] ° ! 7 " Umber of 
selected from current data set 5- ^ ^ m ° leCules 
with pharmacophore yfor fh! ~ SCOre ass ociated 
-mber of acceLb,: ^ ^eT * = 

5 / = lmax(0,(av cov-0 Cj ))]" (4) 
where max(0,av cov-r)r.li - 

and av cov-ft,, av cov = ° f tlle values 0 
count across all occupied Dharm^T pharmac ophore 
ing library, Oc, = n m T Ph ° reS in constrain- 
pharmacop y hore 1 in Ztlr COntai ^8 
defined weight ""straining library, v = usef 



av cov = 



Partscore = 



1.1 

—■1 

Unjque c 



(5) 



(1) 



where Partscorp = • ■ 

Possible mean aJK2 T° = 

occupy a single partition) r= m ea n mn. en f m ° ,<iC,,,e8 
across all partitions n „ - T > wolecu,e occupation 

cupying partition^ ' ~ ^ ° f m °' ecules °c- 

^ln T ^ r ~^ flexibility. a 

mations for each module (aslf °' CaIcu,able confer- 
tional search criterion use ti h J^r^ the COnfornla - 
«ling PCL script) haTbe fn Z pro- 



^«epha^pSrKr- U n tiW , toa % W ^^ 

c t° T : b t occ r pied in pharma - 

ivj io allow the user to weiehr i-h„ 
promiscuous molecules ^f,-,„ r g , score a S air, st 
number of pharma^h^SSf* ^ * ^ 
macophores present in nil , " Umber of P h ar- 

(Tot Pharm) P Si ^^ed molecules 

l^i) FinaJJv th. t , 6nergy functi on. 

-ample based on max m^S °, faCCeptab l"ty (for 
Phore promiscuity); A,l th Z I IT " pharrnaco " 
create our overal^cor £ ^ ^"" eS a '" e comb >ned to 
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= (^<^ x Conscore x Partscore * ™ ^ " 3931 

P-apf on A ^ DP * CflninnnvMi I 



Partscore,/ x Partscore,/ x5)/(r^W 

*--./-usa rd e finedweights Fb2x ^ 6 ' 
Experimental Section 

Chem-Diverse- respect to its nearest reJative 

heavy atoms (excluding h a Io B e ns ) a ! T^" 15 and 51 
choosing molecules with »dS P -; S ™P' ,stic criterion for 
this technique as a general screen f N ,° te that we em P% 
case of the SDF, itlsless relevan Z^t S ' Za 1,1 th * 
smaller mo l ecules remo ^ s ^ t lth ougn many of the 

antiseptics), while most of 7h* > ° f Iess intere st (e.E 

AH these structuns^cST^" 1 ™ 1 ""* are f*Pti*f .' 

(ii) A simple hypothetical comhin ♦ ^ , USi " 8 Co " c °rd." 
two components undergow S^Sf ^Prising 
The acids and amino acids for7h P , h formation (Figure 3) 
the available chemicals da ta a selZ^^' 1 f '° m 
selected were constrained to hP nfh , The rea 8 e "ts 

atoms (excluding halogens) so that rh Wee ," 8 and 25 hea Vy 
size of the molecules seized fro^ the sS^"* matched th ° 
size). Molecules with a bltZ , F (a § ain a "druelike" 

of heteroatoms ex ^n fi halotn", raU ° (rati0 ° f the »^her 
of heavy atoms in the moTeculSout ilT t0tal 
range in which >90% of the 1^ ' ~°- 5 ran § e ®* 

then removed, aswer, Z. ™ es,n tne S ^ fall) were 
toxic and reactive) groups " contai ""ig undesirable (e.g. 

^^^S££&T alt T 8h we w - ,d 

of reagents available to many omb 1^7,' t0tal " Umber 
ing the one used here) make 3, b ' aries (i " dud - 

We must therefore filter™* re e nflT^ P rohibi ^- 
can deal with in produc space << iff nnr' f™" £ ° 3 Size we 
simpler and more rapidly calcu Hhl n °°° products > us ing 
-agents were clustered Lfn^ T ° thiS end 

Wootan spheres- at a similarity leve[ofS 77 Th?™" and 
ology is designed to provide a ! " t hlSrae,tl0(| - 

3D ssrca^ r d were conv ^- 

upon conversion to 3D and hese w re ^ * CaUSe problems 
67 amino acids and 505 arTri? 50 lemoved - T hisleft 

33 835 products. BoA theSD^andh" 8 ^ ^'^ size °< 
then profiled using Chem X/Chem f? het ^' Hbraries wer e 
software, employing our^ovvn Pr"< ^ (Ju1 ^ 96 vere ion) 
molecular pharmacophore profiles h SCnpt - The resulti "8 
flexibility values were stored nn H- , V ^ at ° m count a "d 
around ! day to be profiled on a SC Si, mh o DF ^ Uired 
which all calculations were ? a " Z Rl000 ° ( U P°" 

required to profile the KSS ni Aniund 6 da >* were 
experiments were undertaken J a ' y structures. Five 

the performance orHrRPicT th6Se data Sets t0 a ™lyL 

caSid tot r XS' e h° f ^ SDF data « -s 
typical molecular cohecdon P distribu «on across a 

of S^£S^!S^^ * 

tions. uni mG under various condi- 

^tffl? the hypothetical library were 

HARPick when L ectinrcoln 8 ^ 8 ' 1 ; the P^^nce of 
data sets. " 8 com Po u nds from multicomponent 

of HARPick to ^ff^ the abil it J 



Component } 



D 



Component 2 

- d the 

') a "d acid (component ^ re^enl. m "° add " ( com Ponent 




p. No- of ii mes pharmacophore hit 

numberofpharmacophoreinZfir-f™/^ 11 Total 
geometrically accessible uharm ,ri . ~ 7 745 ' Nu mber of 
of different pharmacoptt t TJ, ^ r ^ = ,' 84 881 Numb er 
Thus over «8% (^Gs's 8^ / he a " = 126 553 - 

Phones are present in the library dt - ces ^e pharmaco- 



Results 



jn the same^o^ 

histogram of pharmaronhn, ! i ^ ' The resulting 
in Figure 4. P maC ° pn0 ' 6 d,strib ^ions is illustrated 

us!rc h ^ e Di U vers e : S^ARP^ S ° F ^ mad « 

smaJIerpharmaSr^f y ^ the Selection of 
maximum pha^acopnore T str "^™- The 

between eacn ^^5^^^ 
was set to 60%. All molecules in t h k ^ 
Passed, with any passinethP , i I 3ry were 

added to the selected stSr Ti ^ Criterion bein 8 
in the selection of 372 ^ c Is" CB,CuIatton resu1 ^ 

samtsoTd^f r t : s t r re then undertak - °" 

identical set size to to^^JZ™" Seie « a " 
diversity criteria' h) , (372) usln S different 

cophoreLeii ty 0 n y TheZZ * 'T™ 1 pharma - 
were ap p, ie d to^he Jvera ^y & ^ 
Conscore = 1, w= j J y '"" Ctl0 [L (see ec l s 3 and 6): 
the internal pharmacophore diversity whi.^ Maximize 
the shape partition s ores Diversftv f maximizi "g 
and weights app l ied: Conscon 1 % 
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calculation 3 



HARPIck2(ii)( a ) 
HARPick 2(ii)(b) 
HARPick2(ii)( c ) 

random 2(iii) 



no. of unique 
pharmacophores* 

49829 
{71096^} 
105222 
619J3 
70656 
{87566'} 
39625 



total no. of 
pharmacophores 



68987 

419870 
93977 
137801 



0.33 

0.97 
1.00 
0.90 

0.54 



property partition 
function scores 



pa 



ha 



PP 



0.66 0.43 



no. of calculable 
conforrners 



0.69 
0.90 
0.79 



0.70 
0.85 
0.54 



0-47 0.34 



2.2 x job 

4.0 x I0« 
6.4 x 10 s 
2.4 x 10 fi 

]. 



run parameters 
Conscore, w, x, y t z 



not applicable 



. 1.0, 0,0 
■ 1. 1.0.75,0 
2 * 0.5,0.75, 0:33 



and minimizing the ST? n- ^ Part,Uon SCOres 
ues and weights applS f rf ° IVerSit y fun «i°" val- 
0.5.,= 0 .75%= oS SC ° re = w = 2 0.^ = 

^ Sk* reSn 3 e ; 2 the r erage data f - 

collated. electl °ns of 372 molecules were also 

Note: For all the above HARPirk ™i i • 
ture pharmacophores witf,*™ caIcu, ations, struc- 
0-7 relative to th ^ a ^ nmeter ratio of less than 
found in the same moleS P harmaco P ho ^ perimeter 
ing diversity pr ™ J The res ^ 

Figure 5. 8 data are shown *" Table 1 and 

ers) were removed fro m the srS ° Cak " Jab,e c °nform- 
15 716 structures wer" a eain „ n' J" 6 
Diverse and HARPirusina 8a! ff Pr ° f,led USin 8 Chem 

a subset based on 2*^%^.™*****°*** 

calculation molecules wZ nT f 'T^' For this 

-ended by Chemica,^^ t™"^' « «W 

maximum pharmacophore fert pe T en data SCtS ' The 
between each keverl Li. , P , percenta g e Permitted 

was once nJ^Z^j"***™ "brary key 
Phore area to heavy at t 3 P ha ™aco 

enforced. The caiculatio„r rati ° S ° f °' 4 Was 

selection of 400 ^S^T^T^ ^ ** 
after processing 4500 structures m V6rSe achieved 

-£ SDF daS w^h r ^ « 
identical set -i*^cC^^S»«^ 0 ■* te « *> 
diversity criteria: (a) Maximal ' g Various 
cophore diversity onTv n ^ lnternal P ha ™a- 

weightsapplieTScofe'TS: 'T™ ^ and 

Pharmacophore diversfty" whi e the ,nternal 

partition scores and m ^ maximizing the shape 

f y function ^r^S^S^ Di -r- 

w = 2.0, a-= 0 <5 „- n 7s a PP Jie d- Conscore = 

£ internal P han^° 0 iVvL°^2 ! ? J f ,mte 
minimize the pharmaronhnn Versit > wh ''e trying to 
molecules. DivSXfS^ promiscuit y of the selected 
Plied: Conscore ITi:?^ and ™^ ap 

Try to balance ,1. Z J3 10 . * 



' 2 } i }- AH [irimary HARPick r a >r„i^ _L_Ji22__^not applicable 



160 
14Q 
120 
100 

Occupation 80 
frequency 60 

40 
20 
0 




f»llSDF library 
' C h e m -Dive rse 

^-ac^ -argest mo - 

6) from study 2. Left-hand * aquations i and 

collated. 0t 400 '"o'ecules were also 

Note: For all the above HARPirk ,oi , • 
ture pharmacophores witn^n! calcula tions, struc 
07 relative to th ?J't * l P« Imet ^ ratio of less than 
found in the same mSS ^ armaCO P hore Perimeter 
Ing diversity profile d J , n rem0Ved The resuJ t- 

(4) The fourth 8 3re 8 lven in Table 2. 
"ustL h d L™; 8 ? 0 " WaS Undertak en to il- 
Diverse in mu S fe n, ^ n HARPkk a " d Chem " 
this occasion, p S a % COm P ''° dUCt profiIln g- 
of reagent seba^cST^TT ^ t6rmS 
analysis was a subset of th.! The data se t chosen for 
in Figure 3. A^^^^ lib -"y shown 
first 19 reagents from fomn c ° m P on ent 1 and the 

CPU to be profiled in da YS 
were executed using this data set exper ™ents 

^S^ZZ^ se H ,ect a subset 

ecules were or^^tc^^ 
niand, which trips ^ a ^"em-X sample" com- 

start of the list. The maximum f f,n 8 er P rintsS ) at the 
Percentage per m iti™™ ea SE , 7 ,, r ^ 
the total library key was S 95? K n, °! eCule and 



IiyiilJffi^sfromS tudy 3 



calculat ion" 

Chem -Diverse 3(j) 
HARPick3(iij(a) 
HARPick3(iij(b) 
HARPick c 3(ii)( c ) 

HARPick 3(ii)(dj 
HARPick3(ii)( e ) 
random 3(iii) 



no. of unique 
pharmacophores 6 
37811 
73999 
55677 
55156 
{61207} 
50811 
49994 
26992 



ha 



property partition 
function scores 



58391 
237656 
99180 
100997 
{109371) 
69727 
78191 
56102 



0.8 
0.82 
0.89 
0.80 
{0.88) 
0.46 
0.71 
0.45 



pa 



0.67 
0.55 
0.92 
0.86 
{0.92} 
0.55 
0.7] 
0.37 



no. of calculable 
conforrners 



0.47 
0.57 
0.60 
0.57 
{0.61} 
0.36 
0.55 
0.28 



3.7 x I0 5 
1.1 x 10 G 
4.9 x 10 5 
3.1 x JO 5 
{3.1 x 10 5 } 
4.1 x JO 5 
3.1 x 10 5 
3.1 x 10 s 



run parameters 
Conscore, w, x, y t z 



not applicable 
1. 1,0, 0,0 
] - 1,0.75,0 
L2. 0.5, 0.75,0.33 

0, 1,0 
l < 175 > 0.25. 1.0.33 
not applicable 



■ t 'V^^^^^^- ~ _^___037 o. 28 * * j£ 1.75. 0.25. 1. 0.33 

iterations at a sneer) nf Z caJc uIation -4 h. All n7im^w77^~7~ __^^xlU not apn]j ca b]p 



calculation 

Chem-Diverse 4fi) 
HARPick 4(ii 



no. of unique 
pharmacophores 



total no. of 
pharmacophores 



no. of reagents 
selected from 
component 1 



no. of reagents 
selected from 
component 2 



no. of calculable 
conforrners 



run parameters 
_Cbnscore, w t x,y, z 




(12%) 
203061 
(19%) 
51837 

(5%) 



0: 1,0, 1,0 

i — . — 5I837 vz = !,// = 10 

J HARPick calculations ran fhr -— — tw\ not applicable 

See eqs 3-6 r 31 ound 30 m ™ ^ : 20TO^25mrn^ " ' ■ --J_L__ 

™ 

lated than ,„ ^ en™ ^ Jfi ' 5 0)(.)hteraaons P er S econd). 

5) was set to 10, which lean t n 
pharmacophore occu^o .eve To/VST ^ ^ 
term. The minimizing f.L ! / the cons traint 
simulated anneaW "1^0°" ^ the ful1 

for this calculation 8 P'^durej was employed 

1 J- 50 from co^f^* Jj" «™P«* 

than 0.7 relative to the fa .Z , 3 3enmeter ratio of less 
eter found in the same If P ha ™a«»phore perim- 

resulting dive^^C'S 8 ^". rem ° Ved - The 

j looming data are given in Table 4. 



wer?p d ro ^ £ A " -olecuies in the , ibrary 

being allowed to El? ^ Ratherthan 
™er(aswem^^ 

Pharmacophore perimeter flih~ ' °" thiS ° CCasion 110 
molecular JharmEhnrp / ^ apP ' ied t0 the 
(4) are given in Table 3 mPt ° rS - ReSU,tS for st ud y 

^fuUhy pot ^Z"L^ U J d C ° nstrained Cranes, 
set (Figure 3) w a X e Zl^I".,™ com P°™ntdata 

calculation JS^™,£«£*"' with the 
component 1 and 50 f r „ m 20 rea S^ts from 

sterna, phaJat^,^^: (a) Maximize 
values and weights wp™ , , T y ' Tne f °"owins 
"on: Consc!^ Z = * the diversity fun" 

Starting with the 'selections ™/ = 5 ' 2 = °' (b > 
maximize internal pha^Z^"-" (5)W(a) ' 
same time, weight the caS ♦ S ' ty ' At the 
Phores which ^^5! Ph ™ C °- 
shown in Figure 4 Thp fnii " F hbrar y P rofi le 

» * d„ e ,5 fu„M„„ fcl i~T g ;: ie , h ' s t 



Discussion 

a bin^ S™t£ SDF r K , " UStrateS the pr ° ble ™ ^ 
20 168 molecules TnH ""^ StUdied cont ains only 

applied, ye ov t 6 S d of aSt, ; ,Ct P er,meter filter ^as 
sible phaLacophore arP g eome ficall y acces- 

should be no ed tha ' XT** " the data Set lt 
accessible phar^o^^^ of these 
unsuitable in mediciml rhl • considered 
acid-acid, lipophTl cH D0 S' S 7 t6rmS ^ acid ~ 
distances > 20 let c ) h C ~'' p0philic - 3,1 three 
is almost cer ai^; hi it US W h ? CtUal ° CCU P ati °" 'evel 
applied, over 15 mijZ „L '"1° Perimeter filter ls 
the library, nhM lTti^ T™^ 0 ™ Pm in 
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40 000 are hit mo e than S r «° ""^ around 
J* advantage of J* ' £ ^ » ^ 

library design, it is evident that w - u drained 
this distribution inform," n (a HARP f ° eXp ' 0it 
u smg eq 3). Studies 2 an * attempts to, 

illustrate the many adv anf (Tab ' 6S 1 and 2 > clearly 

« 3 customizable 
Both Chem-Diverse a n \ HaSu ^Sl^ P rob1 ™- 
ably improve molecular \ 3re able t0 consider- 

Phore count, comS^^l ba f d « Pharmaco- 
3(iii)). HAR Pickc P aJ cltS Tr? eCti0 " S (2(iii) and 
were set to purely nSSnh^^ ^ 3( " )(a) ' Wnich 
are able to find arminH? pharmac °phore diversity 

cophoresoftne corpa^S ^n"^ ° f phar ™ 
would expect, however t he ™ns. As one 

stantially more flexTble and Ch ° Sen are sub- 

by the total pha^^'ST-rf (aS fenced 
counts) and are also not nnV „ CalCUJable c onform er 

simplified version of shap P e ? y P "? tloned in <™ 
HARP ickcalculat ion ° s f in Sh s ; P ace_ The rem aining 

we can address these varin,,. « T 3 lllu strate how 

simple customization? he ^f 6 " 0n fea tures through 

Calculations 2(ii)(b) " nd 3fc ) ?ur" ,ty 1 f corin g function. 

of the partition function i n ^ the indusio " 

considerably improve °he shan "* ^ d,Verslt * scor « 
while still allowing eoo , T pr0pert y Petitioning, 

Figure 5 lMustnu^^Jl^?*^ ^ersity. 
function (eq 1) in the divTsi v , ^A" 8 the partitio " 
seen to broadly follow he t eL' 0 ' 6 ' Chem -Diverse is 
SDF library with rLect to n Pr6Sem *" the whole 
• Study 2 M fa). whic? P i on f v P f T ,m H ter distrib "tion. 
vernal pharmacophor dive sitv sS * ""^^"g 
ance of larger perimeter Z Z„ Y ' o WS a P re P°nder- 
2(ii)(0, however, IT o whTh 8 -. St , UdieS ^ and 
weighting in their d versl^ 2 T ^ * Partitio " 
more even partitioning of n Sh ° W a significantly 

and 3(ii)( c ) P Jlust ;r h g oJ IZ^St St f Ud 7 ^ 
minimization function fen ?1 f t " of a flexibility 

number of calculab e conformed 3ntiaHy reduCes th * 
d ata sets. Even when we an,? ^T™ * the seJe «ed 
cophore count ir ^11^7 ^ ^ Unk,Ue phai ™ a - 
Chem-Diverse calculating frff «" the 

HARPick selection smZnl \ foot ^ the 
Pharmacophores HARP I * substa ntially more 
increasing^ wightog f r L"? ^ Sh ° WS ' h ™ 
term (eq 6) can dramaS denominator 
Pharmacophore ocSSKSS ^ ° f Uni ^ 
d e. reduce the pharmaconhnl pharma co P hore count 

even wl^theSS^^W "Urates that 
«on S , the resulting HARPirk , "Constraint func- 
performtheChem Di™ a l Sel f^ 0n is able * out: 
Potion scores ™« 
macophore types are found anH I ° 6Xtra phar : 

cophore promiscuity ra "t " " COm P ara b'e pharma- 
HARPick versus 6596 for Ch "T*™* (64% for 
Chem-Diverse problem tlltt ^ ', DlWrSel An °ther 
that only 4500 struct^refwef ^, StUdy 3 * the fa « 
quired selection o77oo ZLZefT ^ the 
control the size of the selected H,? 6 ° nly wav t0 
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„ cu, u j^ewjs 

tures have been chospn tk- 

HARPick, there s no wav to '"""I tnat ' ^ ** 
available structure space ^ Sa '" Ple the whoI « of 

3). Chem-Diverse seleTeS In ^ T" Pr ° Cedure ^ 
able product data it fe ther P f ^"^ fr ° m the avail- 
efficient combination , 3^"^^^ the most 

D'verse selected products r m ■ bowever . Chem- 

stituents. This could bco'^'H 1 " 8 differ6nt co "- 
efficiency, setting ]5 ; a ^ ' edasa selection of 6196 

largest number of reagents whih ^" 1 3nd 69 
50 products of this ?w 0 r r C 0U,d beseJected from 

efficient (,69 - 36, x T 00 / 6 7-^^ ?** Set ^ as °% 
a simple matter in HARPirk , n J J L " cont rast, it is 
that a 50 product selection f "' the re ^ment 
reaction contain 10 reaeenKfrn 0 '" 3 tW ° com Ponent 
reagents from c 0nl ponen 2 n COm P one "t 1 and 5 
selection can be maK th M f W ° % e(fici ^ 
resulting data se is 7r ' T d this case the 

significantly less e fi en set Th arab ! e ^ t0 th e 
This is of substantial ZortL^ 6 " ^ Chem - D iverse. 
wish to create mult ! ' S '" Ce man y chemists 

efficiency, both on g ^ unTof 0 ' 13 ' ' ibradeS ° f 100 % 
Programming. The final t.H T a " d 6ase of r °bot 
t a gesofapp ly 8 ingn : n S r ;^ d * the ^van- 
to diversity profiling calculations A? P 6 COnstrai nts 
both constrained and ," atl0ns As one would expect, 
outperform r a ndo,rsic"-r r'" 6 ^ HARPick runs 
filling div ersity voids S , " ^ pers Pective of 
search (5(i)( a )) does sign icanH T 5 ^^ HARPick 
^udy 5(ii)) because, JZu^Z SdT n" ra " d ° m 
'"any pharmacophores its lh ■ , P1 "° flle conta ins 

leave SS ub S tantialho,e^ n 'hn ely , Smal ' size still 
consequence, maxim ! pha ' ma<;o Phore space . As a 

bound to increase Z c Z^T^™ Spread is 
as soon as we constrain !! " SC ° re ' Ne verthel ess , 
these voids (calcu a i n sSft I?' 0 " t0 fiJ1 

ments are observed. WM e t^Z improve " 
system is maintained the avera"' d ' VerSitV 01 the 
Pharmacophore type is found § COnStrain t score per 
the number of tf, cXSTo^d^" d ° Ubie ' ^ 
as a proportion of the total nh SC ° rin S bins 

increases from 1 2% * ^ P ' n C ° P ^ P ™t 
currently only possible to rn \ Chem-Diverse it is 
have not been hit r t a I nT pha ™ aco Phores which 
voids (the binary key , | ^ j^ni"'" 8 ' ibrades as 
couple intra- and interHbr I h A ' SOthere is no way to 
tion. Combine this w n S SSiT Ca ' CUla - 
smgle Chem-Diverse calcuL ! yS ret)Uired fora 

is clear that a cm^SS^T ^ ^ a " d * 
tical. ' 016 caJ culation would be imprac- 

^^SlSrS^f' 8 3 f — °f »m P lex 
number of uninue nh. y ' ° ne wouId expect the 

calculations " e;£ e7for?° reS f0Und in Stud ^ 
(calculation 5(i)( a )) 8 ln Z Z 5^' T° nStralned search 
search (calcula on 5( )fM) " fmd that tlle Unstrained 
Pharmacophor 2 ffi b ^ , ™ s, ^ ti y™«« unique 

^-core (eq 3) funcdon ^ ^nbh^ " *" 
"•ore pharmacophore sm , ' y mture a,l °ws 

score. As a con eq uen el^, 0 " tnbUte C ° the d iversity 

- b - b econ-~^^^^^^^^^^ 



nhar , 



wXr^^LX'sXr the 

x 10« f*-"nacoph 0 «tSS?iT than 125 

chosen set (data not shown! rl the resij ltant 

the relationships that ca Tb!' J , ^ iUust ^ 
a complex scoring fu nc «on f t6rms in 

for careful setting of S^L^ ?™P has ™s the need 
that optima, section! a r e 0 S 7?^ * enSU ' e 
feature of the HARPick A " lnt eresting 

Nearly all runs i^^X^ ? ^ Speed 
d.re Ct comparison with a stg e c h em 30 ^ A 
difficult, as the profilinac a lr,,lfH„ DlVerse ru " Is 
the data for HARPick essenHaH 11 rei ) ed t0 generate 
single Chem-Diverse studv n ^ ***** 35 ,on 8 a * a 
extreme importan ^lw/veri ^'r 6 ^ ^ 0f 
rarely suflka. ^T^*" ^ . that a ^ 
Problems such as r^Zl proflhn 8 a given library, 
certain re agent tyZ'Z T dislike of 

Physicochemical pro'p nie" in in v T"? 3 ' balance ° f 
lead to the requirement for ml ? Se ' eCti0 " S Can a " 
t-ns If one\onsTde" 7h^^h e Pr0filingCalcU,a - 
hypothetical library studied h J, tl ■ C ° ntm of the 
dear. Smce a single Che ' n ""plications are 

requires nearly 6 days t , Ca,cuJat ion 
sort become compS if'^ pr ° fi,es of 
HARPick, once the pt rmar h e - Converse ly, with 
calculated and stored act u ^n P ?r *"* have bee " 

longer runs can improve rSTl *"* S ° me cases 
it is possible to obtaTn pnnH , 6 haVe also fou "d that 
This reflects the s Xs^ tS 3 Sh ° rt time s P a " 
protocol. We have e3 " atUre of tne annealing 
but could n^Z rSZT S6Veral -hedulef 
using aquickcoo] t a „5™^ We ^commend 
^um, bUaJf^f^St " appr °™e 
trying a longer schedule We T he r6Sults befor e 
diversity rne^^TTf 8*™ that 
near-global minimum mJ," , S ° lute truth . a 
as a global minimum Tci^rlV^ * S ° ,Utio " 
calculations for study 3 (datZn, J ^ Repeated 
annealing calculations Jed r ^ } USi " g identical 

of 26% of moleculesTn ,n SeJec r t,ons with an average 
of the longer Tsoo oS — 

shown) lead to selecti ns S f$ C)) ™^ not 
common. Achieving a 100%7JT ♦ ? molecules in 
study is difficult, hoover as no a t" 't""" f ° r this 
to remove very similar 2 a " em P tn as been made 
would thus b/simi ^fo%A R p S ,f r0m the SyStem * 
identical structures into 2. t0 SUbstitute "ear- 

same quality proSe fe^ achieve the 

run statistic^ (data tfthln^wS ^ 3 °» « 
near-indistinguishable) For calc^ ™ f ° Und t0 be 
clustered (and hence nonidendcln I ^ lm ° Mn * pre " 
<• repeated runs were ab I ■ ^ * rea S ents ' as in study 
set selections. A Z ith 1 TT^ * ^ Sanie da ta 
however, the ^SSX^S^ ,C 
tied to many factors th J 8 3 ' minimum will be 

size and constitution 0 f T ^ the 

studies show that it is oossihl. tn Nonetheless, these 

over relatively short time fat ^ft* ^ 

mes - [t sh °uld be empha- 
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calculations. Indeed h P i 8 .'" Pr3 " kaJ P r °filing 

contains unusuajtl.tcur f" 1 , Ubnu * use ' 
Pharmacophores acrL 33 n ° Jf ; CUJes x 10« 

'"acophore perime ^filter eCU,6S With the P h *' 
average structure onta in L apP ' led) ' With 

- T «:c rr ity t of che ai ^ th - * ^ 

time is taken up in ph™ nature ' 90% °f cpu 
to a first ap P roxfm at f on re f Va,Uatton - whi ^ 

of Pharmacophores ?n t , a e e ut ^ 
intensive primary descriptor 1 m? 3 leSS COm P ut e 
increase in pr0 graT sT ed °n ^ ead t0 a significant 
collating the initlnh r „ K n . Furt r herm °re. although 
the full "product d Ca ^t c "A data f ° r HARPick «ri 
CPU in the case f t he ZZ 'T (6 da ^ 

here), the profiling , rocedTe f ^ Studied 
Parallel calculation 8 Ms S ^ t0 

the profiling calculation J^ 8 ^ ^ josp™,,,, 

dramatically reducing thp ? available CPUs, 

Pharmacophore da ? ^ reC l U ! red «» collate the 
of products in the set tnEd ! 35 the numb er 

Pharmacophore analysis In the , gr3 " ularity ° f th e 
number of products evaluated l r lmg phaSe ' the 
square of the number of events Va !l ieS as 

convergence depends on , ^ The rate of 

redundancy in the set fi e l T" 8 SChedule and the 
the global minimum f o m 1^™ ? S f P3r3ti ° n of 
factors, our guess is thnt th m 'nima). Of these 

number of Lgen ts ^^"f olling one will be the 
approximately nfaSrirH , algorithm has an 
that the RAM S e men Tit™*' H ° W6Ver ' we "ote 
demanding consm Tt tl "n the P™ ^ Wy Pr ° Ve 3 ™re 
Currently, the prog"a m s "or , a *V thmi ' complexity. 
Present in a priduSTa^ eTi n^^ 0 ^ 
To store the 35 million nharmT u ea ^ access, 
natorial library studied here maC °P hores °f the combi- 
ofRAM. Onecou ld lv i sa "^ arou "d 1« Mbytes 
however, where only th^ of™ S fi ?!! 0n °f HARPick, 
are held in RAM. A\] remain ino . S6t of P^ducts 

would be stored on d sk and ' § P ' ,armac ophore profiles 
optimization algorithm ™ T*? 38 r6quired b y th e 
Program structure 0 l aZ" ^"° nS - Wth th * 
been required to sto" he innn , . W ° Uld have 

investigation. While he£ f ? eds furt her 
used above was found to w rk t e n .h"" 6 Pr ° tOCo1 
of reagent selection sugg ests ^ p th !: a ndom nature 
procedure might be impfnufn ef f ,cie ncy of the 

would imagine'tha Tthe e °ec InT °" e 
Properties unique to , 1^.^ eagents possessing 

an improvement i,", t £ ^aily lead tf 

An algorithm which is a e to e " n a °» e " ^ S6t 
Previous mutation QU nlirw3i memory" 0 f 
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The potential lfcxlb^^S^ , « d them. 

I" principle, any descriptor ™ ?" ap P roach is" dear, 
foring functions. One °o U id ^ 3Ppiied tD *e 
functions (e.g. 3D nha rrn ° \ enviSa S e maximizing 

fe* cost per reagent) S 2 n m ; nimi2i "g Unctions 
shape/log/} and bound ngCcdo " f (g6neral 
w«th properties outside bo un T ^"7°°* products 
mum log P). Inprincinle a n g ' m]nim "rn/maxi- 
function could be deS with' S l eustm ^ coring 
which properties are included t USer aWe t0 cho0 ^ 
the functions used on thet T t l SCWing routi " e . and 
of user weightings for eS ™ CarefU ' a PP»««ian 
-suit would' be I t^t^Sf fU " CCi ° n ' the 
Thxs 1S currently an area'of ac^e Sr2 P9radi8m - 



Conclusions 

I^EX^ to tacfcle the 

that answered the neeT f 0 ;r y H eSiga 3 man ™r 
methodologles desert I Urmedlcina ' chemists. The 

inherent de^S^^ XT™ ° f ^ 

sity tools. The terhm? " FSt generation diver- 

Pharmacophore de criot " 7 effident of 
during product-base^X^^S* Sele « io " 
tion of alternative userS^T ^ ea ^ inc °rp°ra- 
extensive profiling Zs t^ T^ P ' US more 
strained by already svnfh ° w desl 8 ns ™n- 

of these features pSe^rat"?" 11 " databaSes - A " 
Profiling .paradig ^ whkh shouW " * ^ Versatile 
in iibrary design. d pr0Ve ^emely useful 

R^SSSS^ftl'SS n°e, a ' r 7 r COlleagUes aC 
suggestions during the ere 2 f 1 COmments and 
particularly David Clark pI 1 d 5 this manuscript 
Stephen P^IS^^^. •*» Masol' 
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